fix: prevent data loss from dirty_since race in push worker#19
Open
miyannishar wants to merge 1 commit into
Open
fix: prevent data loss from dirty_since race in push worker#19miyannishar wants to merge 1 commit into
miyannishar wants to merge 1 commit into
Conversation
The push worker unconditionally cleared dirty_since after wait_until_done, creating a 30-second window where concurrent writes could be silently lost: 1. Push worker sends version A, enters wait_until_done (up to 30s) 2. User writes version B → dirty_since updated to T2 3. wait_until_done returns → set_dirty_since(ino, None) destroys T2 4. Pull reconciler sees dirty_since=None → overwrites with stale version A 5. User's version B is silently lost Fix: Replace unconditional set_dirty_since(None) with a compare-and-swap that captures dirty_since at job claim time and only clears it if unchanged. If a concurrent write updated dirty_since during the wait window, the CAS refuses to clear it, and the pull reconciler correctly skips the overwrite. Also fixes: wait_until_done return value was being ignored — the code stamped last_status='done' even on timeout. Now only stamps 'done' when the server actually confirmed completion. Changes: - Add Db::clear_dirty_since_if_unchanged() (CAS method) - Add dirty_since_at_claim field to PushJob (snapshot at claim time) - Fix all 3 call sites in push.rs (PATCH, POST, binary upload) - Handle wait_until_done timeout (defer to inflight poller) - Add regression test proving the fix prevents data loss - Add happy-path test proving CAS clears normally when unchanged
fd4a633 to
92b4503
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix: Prevent data loss from
dirty_sincerace in push workerFixes #18
The Problem
The push worker unconditionally clears
dirty_since = Noneafterwait_until_done()returns (up to 30 seconds). If the user writes to the same file during that window:dirty_since = T2set_dirty_since(ino, None)-> destroys T2dirty_since = None-> overwrites local file with stale server contentThe Fix
Compare-and-swap instead of unconditional clear:
Db::clear_dirty_since_if_unchanged(ino, expected_ms)- New CAS method that only clearsdirty_sinceif it still matches the value captured at job claim time. If a concurrent write updated it, the CAS fails and the protection flag is preserved.PushJob::dirty_since_at_claim- New field that snapshotsdirty_sincewhen the job is claimed from the queue, providing the "expected" value for the CAS.wait_until_donereturn value - Previously ignored. Now only stampslast_status = "done"when the server actually confirmed completion; on timeout, defers to the inflight poller.Test Results
All 160 tests pass, including:
test_dirty_since_cas_prevents_data_loss- Simulates the full race sequence and verifiesreconcile_onereturnsSkippedDirty(notUpdated)test_dirty_since_cas_clears_when_unchanged- Verifies the CAS clears normally when no concurrent write occurs